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NONPARAMETRIC QUASI-MAXIMUM LIKELIHOOD 
ESTIMATION FOR GAUSSIAN LOCALLY STATIONARY 

PROCESSES^ 

By Rainer Dahlhaus and Wolfgang Polonik 

Universitdt Heidelberg and University of California, Davis 

This paper deals with nonparametric maximum likelihood esti- 
mation for Gaussian locally stationary processes. Our nonparametric 
MLE is constructed by minimizing a frequency domain likelihood 
over a class of functions. The asymptotic behavior of the resulting 
estimator is studied. The results depend on the richness of the class 
of functions. Both sieve estimation and global estimation are consid- 
ered. 

Our results apply, in particular, to estimation under shape con- 
straints. As an example, autoregressive model fitting with a mono- 
tonic variance function is discussed in detail, including algorithmic 
considerations. 

A key technical tool is the time- varying empirical spectral process 
indexed by functions. For this process, a Bernstein-type exponential 
inequality and a central limit theorem are derived. These results for 
empirical spectral processes are of independent interest. 

1. Introduction. Nonstationary time series whose behavior is locally close 
to the behavior of a stationary process can often be successfully described 
by models with time-varying parameters, that is, by models characterized 
by parameter curves. A simple example is the time-varying AR(1) model 
Xt + atXt-i = atEt, t £Z, where at and at vary over time. If the process is 
observed at times t = 1, . . . ,n, the problem of estimation of at and at may be 
formulated as the estimation of the curves a{-) and cj(-) with a{t/n) = at, 
a{t/n) = at in an adequately rescaled model. To study such problems in a 
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more general framework, Dahlhaus [9] introduced the class of locally sta- 
tionary processes having a time-varying spectral representation or, alterna- 
tively, an infinite time-varying moving average representation. In this paper, 
we present a methodology for nonparametric ML-estimation of time- varying 
spectral densities of Gaussian locally stationary processes. Results for pa- 
rameter functions like a{-) or cr(-) then follow from the former results for 
spectral densities. The time-varying AR(l)-process from above will serve as 
a simple example for our general results. 

Guo et al. [16] consider an approach for nonparametric estimation of 
the time-varying spectral density using both a penalized least squares and 
a penalized likelihood approach. For nonparametric estimation of curves 
such as a(-) and cr(-) in the above example, different approaches have been 
considered. One idea is to utilize a stationary method on overlapping small 
segments of the time series (e.g., a Yule- Walker or least squares estimate) 
where the resulting estimate is regarded as the estimate of the curve at the 
midpoint of the interval. More generally, one can consider kernel estimates 
[11] or local linear fits, as in [17]. Other methods are based on wavelets, as 
in [12] and in [13]. 

In contrast to these local methods, we here consider a global method by 
fitting parameter curves from a given class of functions. Such a method is 
of particular interest when shape restrictions are known, as, for instance, in 
case of earthquake data or chirp signals where some of the parameter func- 
tions are known to be monotonic or unimodal (cf. Section 3). We fit such 
curves by maximizing an appropriate likelihood function over a class of suit- 
able candidate functions. By choosing the class of functions in an adequate 
way, different estimates can be obtained. We consider both sieve estimates 
and global estimates in function spaces. The likelihood used is a minimum 
distance functional between spectral densities in the "time-frequency" do- 
main, meaning that the spectral densities are functions of both time and 
frequency. The likelihood considered here can be regarded as a generaliza- 
tion of the classical Whittle likelihood [24] to locally stationary processes. 

The basic technical tool for deriving rates of convergence of the non- 
parametric maximum likelihood estimator is an exponential inequality for 
the time-varying empirical spectral process of a locally stationary process 
(cf. [14]). 

Non- and semiparametric inference has received a lot of attention dur- 
ing the last decade. A general approach uses minimum contrast estimation, 
where some contrast functional is minimized over an infinite-dimensional 
parameter space, including maximum likelihood estimation, M-estimation, 
least squares estimates in nonparametric regression (e.g., [2, 3, 7, 21, 22]). 
The theory for all of these approaches is based on the behavior of some 
kind of empirical process whose analysis crucially depends on exponential 
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inequalities (or concentration inequalities) together with measures of com- 
plexity of the parameter space, such as metric entropy conditions or VC 
indices. The theory often leads to (almost) optimal rates of convergence for 
the estimates. 

It turns out that by using our approach to nonparametric ML-estimation 
for locally stationary processes, it is possible to follow some of the main 
steps of the approaches mentioned above. However, the statistical problem, 
the likelihood under consideration, the underlying empirical process and, 
hence, the technical details, are quite different. For instance, our contrast 
functional turns out to be equivalent to an L2-distance in the time- frequency 
domain (instead of the Hellinger distance, as in the case of van de Geer). 
Further, we do not exploit metric entropy with bracketing, since the time- 
varying empirical spectral process is not monotone in its argument. This, in 
fact, led us to also consider sieve estimation. In addition, there is, of course, 
the complex dependence structure for locally stationary processes which, for 
example, enters when proving exponential inequalities for the increments of 
the empirical spectral process. 

In Section 2, we describe the estimation problem and the construction 
of the likelihood and we present the main results on rate of convergence of 
our nonparametric likelihood estimates. In Section 3, the estimation of a 
monotonic variance function in a time-varying AR-model is studied, includ- 
ing explicit algorithms involving isotonic regression. In Section 4, we prove 
a Bernstein-type exponential inequality for the function-indexed empirical 
spectral process for Gaussian locally stationary processes. This exponen- 
tial inequality is used to derive maximal inequalities and a functional limit 
theorem. All proofs are shifted without further reference to Section 5. The 
Appendix contains some auxiliary results. 

2. Basic ideas and the main result. 

2.1. Locally stationary processes. In this paper, we assume that the ob- 
served process is Gaussian and locally stationary. Locally stationary pro- 
cesses were introduced in [9] by using a time- varying spectral representation. 
In contrast to this, we use a time-varying MA(oo)-representation and for- 
mulate the assumptions in the time domain. As in nonparametric regression, 
we rescale the functions in time to the unit interval in order to achieve a 
meaningful asymptotic theory. The setup is more general than, for example, 
in [9] since we allow for jumps in the parameter curves by assuming bounded 
variation instead of continuity in the time direction. 



Let 



V{g) =sup< \g(,Xk) - g{xk~i)\ :0 < xo < • • • < < 1, m G N 




k=l 
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be the total variation of a function g on [0, 1], and for some k > 0, let 

U|>i. 



Definition 2.1 (Locally stationary processes). The sequence Xt^n^t- 
1, . . . ,n, is a locally stationary process if it has the representation 

oo 

(1) Xt^n= XI ^t,n{j)£t-j, 

j=-oo 

where the ej are identically distributed with Eet = 0, EegSt = for s ^ t, 

.2 



Ee"^ = 1 and where the following conditions hold: 



(2) sup I at,„(i) I < 

and there exist functions a(-, j) : (0, 1] ^ R with 



(with K not depending on n) 



(3) 
(4) 
(5) 



sup\a{u,j)\ < 

u 

sup^ at,n{j) - a( - ,i) 

3 t=l ^ 



K 



K 



If the process Xt^n is Gaussian (as in this paper), it can be shown that 
the Et also have to be Gaussian. 

The above conditions are discussed in [14]. A simple example of a pro- 
cess Xt,n which fulfills the above assumptions is Xf^„, = (j){j^)Yt, where Yt = 
T,j a{j)£t-j is stationary with \a{j)\ < K/i{j) and (j) is of bounded variation. 
In [14], Theorem 2.3, we have shown that time- varying ARMA (tvARMA) 
models whose coefficient functions are of bounded variation are locally sta- 
tionary in the above sense. In particular, it follows from this result that the 
system of difference equations 



(6) 



Xt,n + J2 



n 



X, 



t-j,n 



where et are i.i.d. with Eet = and E\et\ < oo, all aj(-) as well as ct^(-) 
are of bounded variation, 1 -|- X]j=i 0(j{u)z^ ^ for all u and all z such that 
< l^l < 1 + 5 for some 5 > 0, has a locally stationary solution which is 
called tvAR process. 
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Definition 2.2 (Time- varying spectral density and covariance). Let 
Xt^n be a locally stationary process. The function 

f{u,X):=^\A{u,X)\' 

with 

oo 

A{u,X):= a{u,j)ex.p{-iXj) 

j=-oo 

is the time-varying spectral density, and 

f{u,X)ex.p{iXk)dX= ^ a{u,k + j)a{u,j) 

'IT 

3=-oo 

is the time-varying covariance of lag k at rescaled time u. 



For instance, the time-varying spectral density of a tvAR(p) process is 
given by 

p -2 

'^ + '^Oij{u)exp{iXj) . 



(8) /(u,A)-^'(^) 



27r 



2.2. T/ie estimator. Our nonparametric estimator of the time-varying 
spectral density of a locally stationary process will be defined as a minimum 
contrast estimator in the time-frequency domain, that is, we minimize a 
contrast functional between a nonparametric estimate of the time-varying 
spectral density f{u,X) over a class of candidate spectral density functions. 
Two different scenarios are considered: (i) Sieve estimation, where the classes 
of candidate spectral densities J-n depend on n, are "finite-dimensional" 
and approximate, as n gets large, a (large) target class and (ii) global 
estimation, where the contrast functional is minimized over an "infinite- 
dimensional" target class directly, or, formally, J-'n = T for all n. The 
nonparametric (sieve) maximum likelihood estimate for / is defined by 

/„ = argmin£„(5t), 
where our contrast functional is 

(») A.(.)4|:i£{io„(i,A).^}.A. 



Here, Jn denotes a nonparametric estimate of the time- varying spectral den- 
sity, defined as 

(10) Jnf -, Aj = — ^ -'^[t+l/2+fc/2],n-'^[t+l/2-fc/2],nexp(-iAfc). 

^"^ ^ fc: l<[f+l/2±fc/2]<n 
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This estimate is called pre-periodogram [19]. It can be regarded as a prelimi- 
nary estimate of /(^, A); however, in order to become consistent, it has to be 
smoothed in the time and frequency directions. If we choose g{u,X) =g{X), 
that is, we model the underlying time series as stationary, then Cn{g) is 
identical to the classical Whittle likelihood. This follows since the classical 
periodogram is a time average of the pre-periodogram. 
Below, we prove convergence of /„ to 

(11) /^ = argmin£(5(), 

gar 

where 

(12) C{g)= f'^ niogg{u,X) + ^p^]d\du. 

Jo 47r J_7r I g[u,X)) 

This is, up to a constant, the asymptotic Kullback-Leibler information di- 
vergence between two Gaussian locally stationary processes with mean zero 
and time-varying spectra /(u. A) and g{u,X) (cf. [8], Theorem 3.4). Since 
Cig) > ^ /^Jlog f{u, A) + 1} dX du, we have 

f:F = f ^ /G-^, 

provided the minimizer in (11) is unique (a.s., uniqueness of the minimizer 
follows in the case f £ J- from the inequality logx < x — 1 Vx^l). 

We now give three examples of possible model classes In these examples 
and in all of what follows, candidate spectral densities are denoted by g{u, A). 
The true spectral density is always denoted by f{u,X). 

Example 2.3 (Model classes for the time- varying spectrum), (a) The lo- 
cally stationary process is parameterized by curves 9{u) = {6i{u), . . . ,9d{u)y G 
@ and J- consists of all spectral densities of the form gg{u, A) = w{9{u), A) for 
some fixed function w as, for instance, in the case of tvAR models discussed 
above. 

(b) The process is stationary, that is, g{u,X) = g{X), and the spectral 
density g{X) is the curve to be estimated nonparametrically. 

(c) Both the behavior in time and in frequency are modeled nonparametri- 
cally. An example is the amplitude-modulated process Xt^n = ^{ji)'^t where 
Yt is stationary. In this case, g{u,X) = gi{u)g2{X), where gi{u) = ifiu)"^ and 
g2{X) is the spectral density of Yf. 

In the above examples, the curves 0j{u), g{X), gi{u) and 5'2(A) are assumed 
to lie in 'smoothness' classes, like Sobolev classes or classes defined through 
shape restrictions (see Section 3 below). 
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2.3. Asymptotic properties of the NPMLE. We now motivate and formu- 
late rates of convergence for the NPMLE. It turns out that sieve estimation 
leads to rates of convergence for the NPMLE which, in an i.i.d. density 
estimation setup, are known to be (almost) optimal. In the case of global 
estimation, the obtained rates are slower. Whether the same rates as for 
sieve estimation can be obtained for global estimation is an open question. 

We start with some elementary calculations which demonstrate the struc- 
ture of the problem, the importance of the empirical spectral process and 
the fact that the L2-norm of the inverse spectral densities is a natural norm 
for studying the convergence of the NPMLE. 

First, we define the empirical spectral process by 

(13) i?„(0) = V^(F„(</.)-F((/.)), 
where 

(14) F{(^)= (P{u,X)f{u,X)dXdu 
and 



J-TT 



(15) F,.(<l.)='-tj'j[i.x)j„{l.x)iX. 

In the following motivation, we only consider the case J-n = T for all n. 
The case of sieve estimation is similar in nature [see proof of Theorem 2.6, 
part (/)]. By definition of and /jf, we have 

(16) £„(/„,) < £„(/^) 
and, similarly, 

(17) li{!T)<L{Jn). 

Combining (16) and (17), we obtain the basic inequalities 

< £(/„) - HUt) < {Cn - C){f:F) - {Cn - C){fn) 



where 

(19) i?iog(5) := 



1~ / -y]log5f-,A) - / log5'(u, 
47rJ_7r[n^ \n J Jo 



A) du 



dX. 



Hence, if supg^-p\Riog{g)\ is small, the convergence of C{fn) — >C(/jf) can 
be controlled by the empirical spectral process whose properties will be 



8 



R. DAHLHAUS AND W. POLONIK 



investigated in Section 4, leading to the subsequent convergence results. 
Note that in the correctly specified case where /jf = /, 

which equals the Kullback-Leibler information divergence between two Gaus- 
sian locally stationary processes (cf. [8]). Under certain assumptions, the 
equivalence of the above information divergence to p2{l/ fnA/ f)^ is shown 
below (Lemma 5.1), where p2{4>,'>P) = P2{4' ~ V^) with 



(21) p^(cP) = (^1^' £ \(P{u,X)\-'dXdu 



1/2 



Hence, properties of the empirical spectral process lead, via (18), to the con- 
vergence results for p2(l//«i 1//) stated in Theorem 2.6. The above discus- 
sion shows that these results are also convergence results for the Kullback- 
Leibler information divergence. More generally, we allow for misspecification 
in our results, which means that we do not require /jf = /. In this case, ad- 
ditional convexity arguments come into play (cf. Lemma 5.1). In order to 
formulate the assumptions on the class we need to introduce further 
notation. With 



(22) (p{u,i):= j </.(u,A)exp(iAj)f^A, 

let 

Poo{(f)-= ^ sup I (/)(«, j) I, ?)((/)):= sup !/((/)(•, j)) and 

u 

3=-oo 
oo 



(23) 



oo 



J 



Furthermore, let 

[g 

Since the empirical process has 1/g as its argument, it is more natural to 
use the class J-'* instead of J-' in most of the assumptions. For our sieve 
estimate, we also need a sequence of approximating classes denoted by Tn- 
The corresponding classes of inverse functions are denoted by J^*. In the 
results on global estimation, J^* = T* . 

Assumption 2.4. (a) The classes Tn are such that /„ exists for all n, 
and T is such that /jp exists, is unique and < /jr < cxo. 
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(b) For any (j)^ !F* , there exists a sequence '/r„((/)) G such that P2{4>i T^nifp)) 
as n ^ oo. 

(c) There exist < < 1 < M* < cx) with < \(f>{u,X)\ < M* for all 
u,X and (j) £ Jy^. Furthermore, sup^gjp* Poo{4') ^ Poo < oo, sup^gjp. ■?)((/!)) < 
■u < oo and sup^gjp, vy,{4>) <vy,<oo. All constants may depend on n. 

The bounds in (c) are not very restrictive. For instance, for tvAR pro- 
cesses, only finitely many (/){u,j) are different from zero; see Example 2.7 
below. (the uniform upper bound on the model spectra) is only needed 
for bounding Riog{g) in Lemma A. 2. This bound can be avoided by a condi- 
tion on the variation of J^^ log g{u, A) dX which in some cases already follows 
from other assumptions; see Example 2.7 and Section 3.1 below. In that case, 
the constant M=k in Theorem 2.6 can be replaced by 1. We mention the fol- 
lowing elementary relationships: 

sup|(/)(u,A)| < ;^Poo(0), v{(j)) <vj:{(t)), 

m < I V{(p{;X))dX, P2{(t>) < -^Poo{<P). 



^27r 

Our results for the NPMLE are derived under conditions on the richness 
of the model class !F* , as measured by metric entropy. For each e > 0, the 
covering number of a class of functions $ with respect to the metric p2 is 
defined as 

, . N{e,^,p2) = inf{n > 1 : 3(/)i, ...,(/>„, G <i> such that 

^ ^ V(/)G$ 31<i<nwithp2(0,0i)<e}• 

The quantity H{e,^, P2) = logN{e,^, P2) is called the metric entropy of <I> 
with respect to p2 - For technical reasons, we assume that H{€, P2) < H^{e) 
with H^{-) continuous and monotonically decreasing. This assumption is 
known to be satisfied for many function classes (see Example 2.7). A crucial 
quantity is the covering integral 

(25) J H<s>{u)du. 

In contrast to (25), the standard covering integral is defined to be the integral 
over the square root of the metric entropy. Here, we have to use this larger 
covering integral which leads to slower rates of convergence as compared 
to nonparametric ML-estimation based on i.i.d. data (cf. the discussion in 
Section 2.4). 

Remark 2.5 (Measurability). We will not discuss measurability here. 
All random quantities considered are assumed to be measurable. In the 
case where J-'* is nonseparable, this measurability assumption may be an 
additional restriction. 
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Theorem 2.6 (Rates of convergence). Let Xt^n be a Gaussian locally 
stationary process. Let Tn, ^ be classes of functions satisfying Assump- 
tion 2.4. 

Part I (Sieve estimation) Suppose that there exist constants A> 0, A:„ > 1 
withlogN{r],J^*,P2) < Aknlog{n/rj) for allrj > 0. Letcn = max{poo, fE, (M*)^}. 
If f eF, we have, with a„ = inf^gj^^ P2{jj-,^), that 



p,(l-,j-]=0p{6n), 



where 5n satisfies 



, M* CnKlogn \ 

(26) ^-=M:'^''\i n 

If f ^ T , the same result holds, with an replaced by bn = P2{j^,j^), 
provided that all J-* are convex. 

Part II (Global estimation) Let Tn = J'. Assume either f £ or T* to 
be convex. Further, assume that there exist < 7 < 2 and < A < 00 such 
that for all rj > 0, 

(27) HMr,)<Ar,~\ 
Then 



where 



_ I n 2(7+1) /or < 7 < 1, 

On — \ 2 — 1 7-1 

[ n 4t (logn) 27 for 1 < 7 < 2. 

Remark. In Part I, the nonrandom term a„ is smaller than Furthermore, 
(an upper bound of) a„ may be easier to calculate. 

Example 2.7. The above results are now illustrated in the correctly 
specified case for the tvAR(l) model 

(28) Xt,n + a[^Xt^i^n = (y(^:^et 

with independent Gaussian innovations et satisfying Eet = 0, Var(et) = 1, 
cj(-) > and sup„ \oi{u)\ < 1, with a(-) smooth and a{-) of bounded varia- 
tion. These assumptions ensure that the corresponding time- varying spectral 
density /(n, A) exists and j^^^x) ~ a^u) ~'~ '^^(^) + 2a(u) cos(A)) [see (8)]. 
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We will assume that a(-) G A and cr^(-) G V, where A and V are model 
classes. This leads to 

J'* = AR{l;A,V) 

= ( — - = ^^(l + a^(u) + 2a(u)cos(A)); a e A; cj^gpI. 

Proof. Global estimation. For simplicity, we assume here that is a 
constant, that is, we choose 

for some < e < 1. We assume further that a is a member of the Sobolev 
space 5™ with smoothness parameter m G N such that the first m > 1 
derivatives exist and have finite L2-norms. To ensure 1 + a{u)z ^ for all 
< < 1 [cf. (6) ff.], we choose 

A = {h{-)£S"'; supjh{u)\ < 1}. 

The metric entropy of A can be bounded by Ar]~^/^ for some A> ^ [4]. 
It follows (under additional constraints on the model — see below for more 
details) that the metric entropy of the corresponding class of reciprocal 
spectral densities can be bounded by H{'q) = Ar]~m for some ^ > 0. Hence, 
Theorem 2.6 gives us rates of convergence (by putting 7 = 1/ni). These 
rates are suboptimal and we can obtain faster rates via sieve estimation as 
we illustrate below. 

Sieve estimation. Let < e„ < 1. We will assume e.„ — > as n ^ 00. We 
choose 

1 



V = Vr,= e 



.2 

For < e„ < 1 and kn a positive integer, let 

a{u) = oo + ^(oj cos(27rjn) + bj sin{2ir ju)); 

uG[0,1]; sup |a(u)|<l>, 

uG[0,l] J 

where ao,ai, . . . ,ak„,bi, . . . , b^^ G R, and let 

T*=AR{l;An,Vn). 

It follows that sup„ X 4'{u, A) = 0(l/e^) uniformly in cj) £ and that M* = 
0(1/ el). Note that we do not need the lower bound M*: Kolmogorov's 
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formula (cf. [5], Chapter 5.8) implies for all u that jZ^logg^ ^^{u^X) d\ = 
27rlog(c7^/(27r)), leading to Riog{ga,<T^) =0- mentioned below Assump- 
tion 2.4, Theorem 2.6 can now be applied with M^, = 1. Further, 



M 27r , 9 , X > 47r 
tx,0 = — l + a^ n < — 



n 



- 27r 27r 

(^,l)| = |</<(n,-l)| = -|a(n)|<^ 



e2 



iu,j)=0 for|i|>2. 



Consequently, poa and us are of order 0(l/e^) and it follows that c„ = 
0((M*)^) = O(^). As a finite-dimensional linear space of uniformly bounded 
functions, the metric entropy of An can, for small rj > 0, be bounded by 
log(l/e„?/) and, hence, a similar upper bound of Aknlog{l/{e'^r])) holds 
for the metric entropy of J-'*. Finally, we determine the approximation er- 
ror an- First, note that for e„ — > 0, we have cr^ € Vn for sufficiently large n. 
Further, for — - — G J^* and — - — G J^* , we have oof — - — , — - — ) = 

/02 (a, CKn))- It is well known that the approximation error of the sieve An 
in S'^ is of the order k~'^ (e.g., see [3, 18, 21]). Hence, we can choose the 
approximating function TTn^i/ fj^) such that as e„ — > 0, we have 

.,(^....(i//.))=o(A 

In other words, if e„ 0, we have a„ = 0{-p^). We now choose the free 

n 

parameters kn , in order to balance the two terms in the definition of 5n ■ 
This leads us to the rate 5n = en ^""^^ Choosing e„ of the 

order (logn)~" for some a > gives us a rate which (up to a log term) 
equals the optimal rate known from the i.i.d. case. □ 



Finally, we state that the same rates of convergence hold if the estimate 
is obtained by minimizing an approximation of the likelihood Cn{g)- An 
example with a conditional likelihood function is given in Section 3.2. 

Theorem 2.8 (Likelihood approximations). Let Cn{g) he a criterion 
function with 

(29) sup \Cn{9) - Cn{g)\ = op{5l/{M* f). 

Then Theorem 2.6 holds with fn replaced by = argmin^gjp^ (g) . 
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2.4. Discussion. Why both sieve estimation and global estimation? The 
reason for considering both sieve and global estimation is more or less tech- 
nical. In contrast to the standard empirical process, the (time-varying) em- 
pirical spectral process En{4>) is not monotonic in its argument (p, that is, 
0(m, A) < ip{u,X) Vu, A does not imply En{4)) < Eniip), since Jn{u,X) is not 
necessarily positive. This implies that the "bracketing idea" from standard 
empirical process theory cannot be applied. For this reason, we cannot fully 
exploit our Bernstein-type exponential inequality (36) below; essentially, we 
can only use the (less strong) exponential inequality (37). Therefore, we 
have to work with a covering integral which is the integral of the metric 
entropy instead of the square root of the metric entropy. As a consequence, 
we obtain slower rates of convergence. Our sieve estimators, however, do 
not suffer from this problem. At least if the model is correctly specified, 
then, as has been demonstrated in Example 2.7 and in Section 3 below, the 
sieve estimators achieve the same rates of convergence as the corresponding 
NPMLE of a probability density function based on i.i.d. data which, in this 
setting, are almost (i.e., up to log terms) optimal. 

3. Estimation under shape constraints. Here, we consider the special 
case of a correctly specified model with constant AR-coefhcients and mono- 
tonically increasing variance function. Our model spectral densities are hence 
of the form 



27rWa{X) 

where the AR-coefficients ai, . . . ,ap lie in the set 



1 + 51 exp(iAj) 



a = (ai, . . . ^apY G : 



i=i 



/ for all < |z| < 1 



[cf. (6)]. We assume that ct^(-) G M, where 

= : [0, 1] — > (0, oo); increasing with < inf s^(ii) < sups^(n) < oo >. 
With this notation, our model assumption can be formalized as 

= AR{p;Ap,M) = S^(l){u,X) = -^^WaiX); a£Ap, cr2(-)GA^|. 

This model, with a unimodal rather than an increasing variance function, has 
been used in [6] for discriminating between earthquakes and explosions based 
on seismographic time series. To keep the exposition somewhat simpler, we 
shall only consider the special case of a monotonic instead of a unimodal 
variance. 
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Global estimation. Similarly to above, global estimation will lead to sub- 
optimal rates of convergence. Since the class of bounded monotonic func- 
tions has a metric entropy satisfying logN{ri,M,p2) < Ar]"^ , and since the 
class Ap is finite-dimensional and hence its metric entropy is much smaller, 
it follows from Theorem 2.6 that our global NPMLE converges with rate 
6n = n~^f^ (provided all assumptions of this theorem are satisfied). As it 
turns out, this rate is suboptimal and can be improved upon by using sieve 
estimation. For this reason, we do not go into further detail concerning global 
estimation. 

3.1. Sieve estimation. We first give a sieve for M. For /c„ G N and < 
e„ < 1, let Cn = Cn{^n, ^n) denote the set of all increasing functions on [0, 1], 
piecewise constant on the intervals {^y^^^]-, j = ■ ■ jkn, and bounded 
from above and below by and e^, respectively. Formally, 



J - 1 J_ 

h ' k 



e2 



With these definitions, our sieve now becomes 

J^ = AR{p;Ap,Cn)- 

Sieve estimation of the spectral density. The next theorem states that 
we obtain with an appropriate choice of €n the known rate of n~^^^ (up to 
a log term) for the NPMLE of the spectral density. This rate is known to 
be optimal for estimating a monotonic density based on i.i.d. data. Again, 
the proof is contained in Section 5. 

Theorem 3.1. Let Xt^n be a Gaussian locally stationary process and J^* 
andJ^* be as defined above with kn = 0(n^/^(log?i)~^/^) and en = (log?i)~^/^. 
If f € J^, then we have 



n) 



Sieve estimation of the monotonic variance function. Next, we see that 
the above results for estimating the (inverse) spectral densities provide infor- 
mation about estimating the monotonic function cr^(-) itself. We show that 
the rates of convergence from Theorem 3.1 translate to rates for a^. It can 
also be shown that the estimators of the finite-dimensional AR-parameters 
have a -y/n-rate and are asymptotically normal. This is not considered here, 
however. 
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Let 

("o,cro(-))= argmin C{ga,a^) 

{a,a)£ApXM 

be the theoretically 'optimal' parameters and 

(«n,S^n(-))= argmin Cn{ga,a^] 

be the sieve estimate. 



Theorem 3.2. Let Xt,n be a Gaussian locally stationary process and 
let T* and he as defined above, with kn = 0{n^^^ {logn)~'^/^) and e„ = 
(logn)-i/^ If f ^T, then 



(30) 



P2 



1 1 



Op(n-^/^ logn). 



3.2. An estimation algorithm. Here, we discuss how to calculate a close 
approximation to the above (S„,a^). The approximation considered is 

(Q„,a^(-))= argmin £„(a,cj^), 

(a,(T2)e^pXC„ 



where 

(31) Cn{a,a^) = - J2 lloga'(-)+ ^ 



n 



t=p+i 



1 2> 



is the so-called conditional Gaussian likelihood. By using Theorem 2.8, we 
now conclude that the minimizer has the same rate of convergence. 

Proposition 3.3. Let Xt^n be a Gaussian locally stationary process and 
let T* and be as defined above, with kn = 0(n-'^/'^(logn)~^/'^) and = 
(log n)- 1/5. IffeT, then 



(32) sup 

(q,ct2)G^pXC„ 



-{Cn{a, Cr^) - log(27r)} - Cn{ga,a^ ) 



opiSl/iM 



*\2\ 



Hence, all assertions of Theorem 3.1 and Theorem 3.2 also hold for fn 
g~ ~2 and a^, respectively. 



We now present our algorithm for calculating (a„, cr^(-))- Although global 
estimation is suboptimal, we first discuss the algorithm in this case in order 
to concentrate on the main ideas. The same ideas apply to sieve estimates, 
as will be indicated below. 
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Observe that for each fixed minimizing Lnia^o^^ over a S Ap is a 

weighted least square problem. On the other hand, for each given a, the 
minimizer over o"^(-) G M. can also be found explicitly. In fact, for each fixed 
a, the minimizer 

= argmin£„(a,cr^) 

is given by the generalized isotonic regression to the squared residuals (a) = 
(X(^„, + J2^=i'^j-^t-j,n)^ ■ Note that there are no residuals for t <p and, 
hence, the estimator is only defined for t > p + 1. It follows that for t > 
p + 1, the estimator 5^q,(^) can be calculated as the (right) derivative 
of the greatest convex minorant to the cumulative sum diagram given by 
{(0, 0), , T,i=p+i e^(a)), i = P + 1, ■ • ■ , n}, by using the pool-adjacent- 
violators algorithm (PAVA) . This follows from the theory of isotonic regres- 
sion (cf. [20]). For completeness, let us briefly mention the relevant theory. 
Consider the expression 

n 

(33) ^ mxt) - <^iyt) - Hyt){xt - yt)), 

t=p+i 

where $ is a convex function with derivative (p. The theory of isotonic re- 
gression now implies that the minimizer of (33) over (yp^i, . . . ,yn) G /C = 
{{yp+i, ...,yn)- yp+i < • • • < y™} is given by the (right) slope of the greatest 
convex minorant to the cumulative sums of the 's, and it can be calculated 
by means of the PAVA. With <I>(x) = — logx, xt = ef{a) and yt = a'^(t/n), 
we obtain 

^xt) - <i>iyt) - Hyt)ixt -yt) = - loge?(a) + \oga\t/n) + ^'(«) - ^'(^/^) , 

Consequently, for fixed a, minimizing (33) over K, is equivalent to minimizing 
>Cn(«,<7^) over all monotone functions ct^(-) € A^. The global minimizer is 
then found by minimizing the profile likelihood Cn{ct,o''^^Q) o'^^r a. Note 
that this is a continuous function in a. This can be seen by observing that 
the squared residuals depend continuously on a. Hence, at each fixed point 
u = t/n, the (right) slope of the greatest convex minorant, that is, ^(t/n), 
is also a continuous function in a. Therefore, we can conclude that the 
minimizer exists, provided the minimization is extended over a compact set 
of a's (as in the sieve estimation case). 

The basic algorithm for finding the global minimizers (a,(T^) G Ap x is 
now given by the following iteration which results in a sequence («(;-), ct^^^, 

k = 1,2 . . . , with decreasing values of £„. Given a starting value ^f^y the 
iteration for /c = 1, 2, ... is as follows: 
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(i) Given cr(^_^)(-)) find 5^ by solving the corresponding weighted least 
square problem 

2 

5(fc) = argmin - ^ ^ -j- 



p 

Xt.n + ^ajX, 



1 



(ii) Find <7^fc)(") as the solution to the PAVA using the squared residuals 



t=p+i ""{k-iy 
itioi 

e^(a(fc)), as outlined above. 

A reasonable starting value cf^Qj(-) is the solution of the PAVA using the 
squared raw data. This algorithm is applied in [6] to the problem of discrim- 
ination of time series. 

The corresponding minimizer of the conditional Gaussian likelihood over 
the sieve parameter space Ap x Cn can be found similarly. First, note that the 
above solution is a piecewise constant function with jump locations in the 
set of points {^^, t = p + 1, . . . , n}. Our sieve, however, consists of piecewise 

constant functions with jump locations in the set {-^,j = 1, . . . , kn}. In order 
to find the minimizer of the conditional likelihood over this sieve, the only 
change one has to make to the above algorithm is to apply the PAVA to 
the cumulative sum diagram based on {(0, 0), ( *^^^~^ , Z]s=p+i — 

rM^l,...,M, where t(j) = r^l. 

Note that in the above, we ignored the imposed boundedness restrictions 
on a^. An ad hoc way to obtain those is to truncate the isotonic regression at 
and from below at e^. Alternatively, a solution respecting the bounds 
can be found by using the bounds and as upper and lower bounds in 
the PAVA. This means not to allow for derivatives outside this range, and to 
start the algorithm at (0, 0) and to end it at (1, J2^=p+i ^^(Sn.))- This can 
be achieved by a simple modification of the greatest convex minorant close 
to its endpoints. This also only modifies the estimator in the tail and close 
to the mode, but it has the additional advantage of sharing the property of 
the isotonic regression that the integral of (the piecewise constant function) 

equals the average of the squared residuals. 

4. The time-varying empirical spectral process. In this section, we present 
exponential inequalities, maximal inequalities and weak convergence results 
for the empirical spectral process defined in (13). Asymptotic normality of 
the finite-dimensional distributions of En^cp) has been proved in [14]. Fur- 
thermore, several applications of the empirical spectral process have been 
discussed there. Let 

(34) p2,n(.^):=f-y: r</'f-,AVdA^ ^ 
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Eni4>)-=V^{Fn{(p)-EFnict>)). 



Theorem 4.1. Let Xt^n be a Gaussian locally stationary process, and 
let (p: [0, 1] X [— vTjVr] R with Poo{(f>) < c>o and v{(/)) < oo. Then we have, for 
all T] >0, 

(36) P{\Enm>v)<ciexp(-C2 



P2,n(<^)2 + ^ 



and 

V 



(37) P{\Enm>v)<ciexp[-C2 

with some constants Ci, C2> 0. Furthermore, we have, for some C3 > 0, 

(38) V^\EFn{^) - F{^)\ < csn-^/\poo{^) + ^(0)). 

Remark 4.2. (i) Since P2,n{4>)'^ < P2{4>f' + ■^Poo{4>)v{4>)^ we can replace 
P2n{4')'^ in (36) by the latter expression and P2ni4') in (37) by P2i4>) + 

{ipoo{m<p))'/'- 

(ii) Combining (36) and (38) leads to the following exponential inequality 
for the empirical spectral process [see (66)]: 

P{\En{(l))\ >??) <c;exp -c'2— -— — ^— ^ 

However, we prefer to use the above inequalities and to treat the bias sepa- 
rately. 

(iii) The constants ci, 02,03 depend on the characteristics of the pro- 
cess Xt^n, but not on n. 

The above exponential inequalities form the core of the proofs of the next 
two results which lead to asymptotic stochastic equicontinuity of the em- 
pirical spectral process. Analogously to standard empirical process theory, 
stochastic equicontinuity is crucial for proving tightness. For proving the 
rates of convergence of the NPMLE, we need more, namely rates of conver- 
gence for the modulus of continuity. These rates also follow from the results 
below. 

In the formulations of the following theorems, we use the constant 

(39) L = max(c,l)^max(i(:i,K2,i^3,l) > 0, 
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where c is the constant from Lemma A. 4, the constants Ki-K^, are from 
Lemma A. 3 and C3 is from (38). All of these constants do not depend on n 
or on the function classes J-'n- They only depend on the constant K of the 
underlying process X^^n given in Definition 2.1. 

Theorem 4.3 (Sequence of "finite-dimensional" index classes) . Suppose 
that Xt^n is a Gaussian locally stationary process. Further, let be a func- 
tion class satisfying 

(40) sup P2{<P) <T2<00. 

Assume further that there exist constants ^ > and A;„ > 1 such that 

\ogN{e,^n,P2)<Akn\og{^^ Ve>0. 
Let d > 1. Suppose that r/ > satisfies the conditions 



(41) r]<d- 



Poo 

1 



(42) T] >cn 2 logn, 



2 ^ 2AAd 2, 1 +/8Ln^logn 



(43) ri'> r2"A:„log^ 

where c > 2^Lmax(t;s, 1). Then there exists a set Bn with lim^^oo P{Bn) = 
1, such that the inequality 

(44) P(sup |^„(,/,)| >r?;B„) <ciexp(-;^4) 

holds, where ci,C2 > are the constants from (36). 

The next result allows for richer model classes. It is formulated for fixed 
n and, therefore, the class $ may again depend on n. Since we apply this 
result for global estimation with a fixed class T* , it is formulated as if $ 
were fixed. 

Theorem 4.4 ("Infinite-dimensional" index class). Let Xt^n be a Gaus- 
sian locally stationary process. Let $ satisfy Assumption 2.4(c) (with $ re- 
placing ) and suppose that (40) holds with T2 > 0. Further, let c\, C2 he 
the constants from (37). There exists a set with lim„-_>oo P{Bn) = 1 such 
that 

(45) P(sup|^„(0)| >77,5„) <3ciexp(-^^ 
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for all r] > satisfying the following conditions. Let a = H~^{y^)- Then 

(46) r]>2^Lm.ax{pao,v^,l)n~^^'^logn 
and either ^g^^nlogn > or the following hold: 

(47) r^>2^-^T2, 

192 

(48) r]> / HMdu. 

C2 J-^ 

20i7ilogn 

By applying the above results to the class of differences {4>i — 4>2'-, <t>i-,'i>2^ 
^{n)iP2{4'i ~ 4'2) < ^n}-, wc obtain rates of convergence for the modulus 
of continuity of the time-varying empirical spectral process. This is uti- 
lized in the proof of Theorem 2.6. As a special case, we obtain the asymp- 
totic equicontinuity of {£'„(<^), G <!>}. Together with the convergence of the 
finite-dimensional distributions [14], this leads to the following functional 
central limit theorem (for weak convergence in £°°(<I>) we refer to [23], Sec- 
tion 1.5): 

Theorem 4.5. Suppose that Xt^n is a Gaussian locally stationary pro- 
cess. Let <I> be such that Assumption 2.4(c) holds (for $ replacing T) and 

(49) / H{u,^,p2)du < CO. 

Jo 

Then the process {En{(p);4' ^ ^) converges weakly in ^°°($) to a tight mean- 
zero Gaussian process {E{(f))](j) € $) with 

cov {E{4>j),E{(t>k)) 

= 27T r (t>j{u,X)[Mu,X) + Mu,-m\u,^)dXdu. 

Jo \\h\\^ J-TT 

5. Proofs. The following lemma establishes the relation between the L2- 
norm and the Kullback-Leibler information divergence. This relation is a 
key ingredient in the proof of Theorem 2.6. 

Lemma 5.1. Let T he such that /jf exists and is unique and C{g) < 00 
for all g & 

(i) Assume that for some constant < M* < 00, we have sup„ x \4'{u, A)| < 
M* for all <j) ^ T* . If is convex, then for all g £ T, 

1 1 



pi[-,— ]<C{g)-CUr) 



MiM^f^'Kg' fr. 

If the model is correctly specified, that is, fj^ = f, then the above inequality 
holds without the convexity assumption. 
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(ii) Assume that for some constant < Af* < oo, we have M* < inf„^A \4>iu, A)| 
for all cj) £ T* . Then we have, wzt/i max(sup„ ;^ |/ (u, A) |, l/M,,,) < 17 < oo /or 
all g £ J^, 



(50) C{g)-C{f)<"-^pl^ljj and 

(51) Cig) - C{fr) < ^ max {^pl Q, -1) , Q, -1 
where pi denotes the Li-norm on [0,1] x [— 7r,7r]. 

Proof. For geJ^, let w = l/g eT*. We set 

^(u-) =£f-^ r {-log w(u,X) +w(u,\)f(u,\)}dXdu. 

\wj Att Jo J-TT 

Direct calculations yield the following Gateaux derivative 5^ oi^: For v,w € 

(52) 6^iv,w):=—^iv + tw)\t=o = ^J^J l--+wf\dXdu. 
It follows that 

1 ( WW 

(53) ^(w)-^(v)-6'^(v,w-v) = — / <^-log- + --l 

47r Jo J-TT I V V 

Since = v{u, A) and t/; = w{u, A) are uniformly bounded and log(l + x) = 

2 

rr + R{x), where R{x) = — 2(TTexF ^^^^ some G (0, 1), we obtain uniformly 
in u and A, 

(54) - log ^ + ^ - 1 = i(^ - vf/iv + e{w - v)f > ^^{w - vf. 
Hence, 



dXdu. 



(55) ^(w) - ^{v) - S^{v,w- v) > 8^(jvpp P2(^',^)^- 

Therefore, the function ^ is strongly convex on JT*. Corollary 10.3.3 of 
Eggermont and LaRiccia [15] now implies the inequality from (i), provided 
J^* is convex. Note further that if the model is correct, that is, if fj^ = /, 
then it is straightforward to see that (53) holds for v = 1/f if we formally 
let 6'^{j,w — j) = 0. In other words, if the model is correctly specified, the 
result follows directly from (55) without using the convexity assumption. 

As for the second part of the lemma, observe that similarly to (54), the 
assumed boundedness of g and fjr implies that 

47r Jo J-TT I g g J Svr \g fr 
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1 fl fi" 



Further, S^{j^ ,w - j:^) = ' Miw - 1/ M dudX. Hence, 



5^ 



(1. — 



Notice that if f = fj^, then 6"^{j^ ,w — j^) = 0. The result follows by using 
(53). □ 

Proof of Theorem 2.6. First we prove part //. Lemma 5.1 and (18), 
along with the fact En{(f)) = y/n(Fn{4>) — E-F„ ((/>)) lead to the relation 



P 



(56) 



where 



<P 



sup 

Rn> " 



> 



p 



167r(M*)2 

V/^ fn 



Rn — -r- (EF„ — F) 
47r 



+ Rlog{M-Rlog{fn)- 



Note that the expectation operator E only operates on Fn and not on /„. 
Theorem 4.1, Lemma A. 2 and (38) imply that the second term in (56) tends 
to zero as n ^ oo. 

We now use the so-called "peeling device" (e.g., [22]) in combination with 
Theorem 4.4. The first term in (56) is bounded by 



P 



sup 



\En( 



> 



n 



2po.>pM)>CSn P2('/'>V') ' 4(M*)2 



(57) 



j=0 
3=0 



sup 



> 



n 



^A^(C2^■<5„)2 



sup I E,^ 

P2{<t>,ip)<C2i+^5n 



> 



4(M*)2 



where Kn is such that [C2^"+^5„J = 2poo- For sufficiently large C, our 
assumptions ensure that the conditions of Theorem 4.4 are satisfied for all 
j = l,...,Kn, [with rj = ^{C2i5nf/^{M*f and T2 = C2^+^6n when (5„ is 
chosen as stated in the theorem we are proving]. Hence, on an exceptional 
set whose probability tends to zero as n— > oo, we obtain the bound 



<3E 



ci exp< -C2 



j=0 



64(M 



*\2 r ' 
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which tends to zero as C ^ oo because nJ^, by assumption, is bounded away 
from zero. 

The proof of part / has a similar structure. First, we derive the analog 
to (18). Let TTn = iTn{^/ fj^) G J^n denote an approximation to 1/ fj^ in JF* 
(cf. Assumption 2.4). Then 



o</:n(i/vrn) -/:„(/„) 

(58) =(£„(l/7r„)-£(lK)) 
-(£(/^)-£(l/7r„)). 



(£„(/„)-£(/„))-(£(/„) -£(/^)) 



In the following, we prove the case of a correctly specified model, that is, 
/ = /jF. Using (50) from Lemma 5.1, we obtain 



1 



1 1 



87r(A/* 

<(£„(l/^„)-£(l/7r„)) 

(59) 



47r ^/n " 

< T- sup {Fn 

ge:Fn 



1 



TT, 



(CMn) - Cifn)) + (^(l/vTn) " C{f:p)) 

.(l/vrj - i?iog(/n) + (/:(l/vr„) - CiM) 



~Y)^ i?log 
JnV 

^) f^Tn - - ) + 2 sup \Rlog{g)\ + ^i-pli^n, V/^ 

V gJ g^T^ 27r 



Note that Vl = max(l/M^,,m), where m = sup„ /(n. A). Hence, = 1/M* 
X max(l, M^,m) < c/M^, with c = max(l,m). Let 7r.„ be a "good" approx- 
imation to l//jF in .F*, in the sense that /32('7r„, 1//jf) < 2a^. It follows 
that on the set {^2(^5 7~) ^ (C'^ + '^)^n}' have, by definition of 6n and 
by using the triangle inequality, that pU^jiTn) > pI{^,jz) - Pli'^n^jz) > 
(C^ + 2)J^ - 2a^^ > C^(5.^. Hence, we obtain, by utihzing (59), 



P 



pl[j,j-^]>{C-' + 2)6l 



< P 



sup 

.5e.f"n ;p2 ( ^ , 7r„ ) > C<5„ 



> 



n 



VTr, 



+ P 



plilTnA/M > 



P2.g 

16c(M*)2 



4(M*)2 



P 



Rn> 



2-1 



327r(M*)2 



By definition of (5„, the last term is zero for C sufficiently large. At this 
point, the proof is completely analogous to the proof of part //. The same 
arguments as used above show that the second term on the right-hand side 
can be made arbitrarily small by choosing C sufficiently large. To bound 
the first term in the last inequality, we use the peeling device as above and 
Theorem 4.3 [with d = max(j^py^, 1)] to bound the sum analogously to (57). 
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We end up with the bound 

for some constant c > and this sum also tends to as C — > oo. 

The proof for f ^ fj^ is, mutatis mutandis, the same. Instead of (58), we 
start with aversion of (18) where 1// is replaced by We then proceed 

analogously to the proof above. □ 

Proof of Theorem 2.8. As in (18), we obtain 
< £(/„) - £(/^) < (Cn - C){fr) - {In - C){fn) 

< {Cn - C){f^) - {Cn - C){Jn) + Op{5l/{M*f) 

and, therefore, the proof of Theorem 2.6 applies, where Rn is replaced by 
R^ = l-{EFn-F)^j--j^+R,,,{f:r)-Rios{fn) + op{6l/{M*)'). □ 

Proof of Theorem 3.1. The proof is an application of Theorem 2.6. 
We first derive the approximation error. For an increasing function £ Ai 
with e < C7^(-) < b, let € Cn with aj = s^{j /kn). Clearly, if 1/e^ > 6 and 

< e, then 



{a^{u)-sl{u)fdu<h/kn. 

In other words, the approximation error of Cn as a sieve for M is 0(1/A;„), 
provided e„ 0, which implies that an = 0{^). Next, we determine a bound 
on the metric entropy of J-"* . Observe that as a space of functions bounded 
by 1/e^ and spanned by A;„ functions, the metric entropy of C„ satisfies 
log7V(r/,C„,p2) < Akn\og{l/{elr])) for some A > (e.g., [22], Corollary 2.6). 
Next, we derive a bound for the metric entropy of Wp = {wa',ot € Ap}. First, 
note that \ogN{r],Ap, P2) < Alog(l/r/) for some A > (since Ap is bounded). 
Since ^^.(A) < 2^? and 

p 

i=i 

this leads to the bound logA^(r/,14^p,p2) ^ ^log(l/??) for some vl > 0. The 
two bounds on the metric entropy of Cn and Wp now translate into the 
bound log N{r],J^* , P2) < Akn log{l / {e^?])) for some ^ > 0. This can be seen 
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as follows. First notice that with s^, the approximation on in Cn defined 
above, we have 



P2{ ^W^ai -^W^a2 ) < 



Observe further that ^ G C„ ^ := -s^ G C„} and note that as a class of 
functions, is very similar to the class Cn- The only difference is that 
consists of decreasing functions rather than increasing functions. In 
particular, the bound given above for the metric entropy of C„ also applies 
to the metric entropy of Since, in addition, < \ and Wa<\^ one 
sees that 

A^(r/, , P2) < iV(??, C-\ P2)iV(r?e2 , VFp, P2) . 

This leads to the asserted bound for the metric entropy. 

Note further that we can choose M* = 0(l/e^). As in Example 2.7, we 
avoid the lower bound by looking at -Riog (50,0-2) separately: We have for 
all n that /^^ log5(ct,(T2 (^^i c^-^ = 27rlog((T^(n)/(27r)) and, therefore. 



sup |i?iog(5')l =0 



log(l/en] 



n 

which is sufficient in our situation [cf. (56)]. Hence, the 'best' rate for the 



NPMLE which can be obtained from (26) follows by balancing ^^ g"fc"J°s" 
and a„, where here, = O(^). The latter follows as in Example 2.7. In the 
correctly specified case (i.e., the variance function actually is monotonic), 
this gives A:, = (^)i/3 and, hence, the rate bn = ^n'^^^'^T^y'^'^ ■ If we 
choose l/cn = 0((logn)^/^), then the rate becomes n~'^/^logn, as asserted. 
□ 



Proof of Theorem 3.2. For the true spectral density f{u, A) = s^(u)/ 
{2'iTv{\)), we assume, without loss of generality, that J^^logv{X) dX = 0. 
This can be achieved by multiplying v{-) by an adequate constant. Since 
/^,r^°S'"^"('^) ^^'^ = (Kolmogorov's formula) and (ao, o'oi')) minimizes £{§^^^2) 
over Ap X 7W , we have 

that is, 

r !f^^A> r ^""^^^ dx 

J-n V{X) ~ 7-7r V{X) 

Hence, we have 
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^9o 



=-/' 






47r Jo 












>-/' 




47r Jo 








2) 





+ 



+ 



dX 



dX 
du 



du 



For S C„, let Ti-is^) = €{§^^ ,,2). Then, obviously, Uq = argmmg2gc„ 'H{s'^). 
As in the proof of Lemma 5.1, it follows that 



(60) ^9^,^) - ^9ao,.f,) = n^') - n^l) > ^4.P2 



1 1 



Here, we use the fact that an upper bound for the functions in C.„ is given 
by Assertion (30) now follows since we know from Theorem 3.1, the 

proof of Theorem 2.6 and Lemma 5.1 that 



1 1 



^(£(<72?-.)-^(5„o,.g)) = Op(^/'2(^;^,;^jj=Op(n-^/'(logn)^). 

Here, we used the fact that an upper bound for the functions in J^* is given 
by and the fact that with our choice of e„ , the rate of convergence for the 

NPMLE for the spectral density is Op(n~^/^ logn) (cf. Theorem 3.1). □ 

Proof of Proposition 3.3. We obtain, with (8)-(10) and Kolmogorov's 
formula, 

1 



2n 



t=i 
1 



+ ^II^27IT J2 «i«fc^[t+l/2+0-fc)/2],n 
t=l Vn/ ij 



j,k=0 



^ ^[t+l/2-{j-k)/2],n'^l<[t+l/2±{j-k)/2]<n: 



where ao = l. The second summand is equal to 



1 

2n 



j,k=0 



{t: l<t-j,t~k<n} 



2 , lt~j/2-k/2] ^ Xt-j,nXt-k,n ■ 
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By using the definition of Cn{a, cr^) in (31), Lemma A. 4 and the monotonic- 
ity of (T^(-), we therefore obtain, with 6n = n~^^^logn and M* = 



sup 



{£„(a,cr^) - log(27r)} - Criiga,a^) 



On 



log(l/en) logn 



n 

2 // n,r*\2\ 



op{5i/{M*r). 



Proof of Theorem 4.1. We start with two technical lemmas. Direct 
calculation shows that 

(61) i^n(<A) = iz;C/n(^0)Zn, 

where Un{4')jk = '^(^L'^^JiJ ~ ^) ^^^'^ denotes the largest integer less 
than or equal to x. The properties of Un{(p) have been investigated under 
different assumptions in [10]. In this paper, we only need the following result 
on ||C/„(<^)||.pec and ||C/n(</))||2, where Uh := tv{A^ A^/'^ = (E^,,- |aijf )'/' is 
the Euclidean norm and || A||spec := sup||3,|j2=i ||^2;||2 = max{\/A | A eigenvalue 
of A} is the spectral norm: 



Lemma 5.2. With P2{4>), P2,n{4')) Poo{4') o-n-d v{(j)) as defined in (21), 
(34) and (23), we have 

(62) \\Uni<P)\\ spec 

and 

27r 

(63) n-^WUniml < 27rp2,n(</')' < 27rp2(0)' + —pooi(t>)v{^). 
Proof. Let 

<i>jk ■■=(!>(- 

Then for x £ C" with ||x||2 = 1, 

n 

\\Un{(t))x\\2= ^ Xi(i)ji4>jkXk= ^ Xjj^l4>j^j+l4>jj+mXj+m 

(64) ij,fc=l ^ j/,m 

< ^SUp|(^jj+£| SUp|(^jj+,„| ^ \Xj+iiXj+rn\, 
t.ra j J 3 

where the range of summation is such that 1 < j ' + ^, j + m < n. Since 
T,j\xj-i-eXj+rn\ < \\x\\2 = Ij we obtain the first part. Furthermore, we have, 
with Parseval's equality. 
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\Un 



n 



j,k=i "'t=ie=~oo 

oo 



2lTp2,n{(py 



(65) 



„1 oo 

/„ s l^<' 

n pl/n oo 

+E/ E 



t=i 



i=-oc 
2-K 



t-l 



n 



+ x,£ 



dx 



<27Tp2{(t>r + —poo{(t>)v{(t>)- 



n 



□ 



Lemma 5.3. IfJ^nis the 
Xn^n)', then 

n-l 

ll^^y^llspcc < E SUp\cOv{Xt^n,Xt+k,n)\, 
fe=-(n-l) * 

which is uniformly bounded under Assumption 2.1. 

Proof. We have, for x E with ||x||2 = 1 and fjj ^ = 

n 

II ^y^-^ II 2 ~ XjajkXk < Y^jYikXj<Tjjj^kXj-\-k- 

j,k=l 

An apphcation of the Cauchy-Schwarz inequahty gives the upper bound. 
The bound for the right-hand side follows from [14], Proposition 4.2. □ 



We now continue with the proof of Theorem 4.1: 

-2n 



Let Bn := T}r!^Un{^4>)T}J'^ and H,, := - AA(0,/„). We have 



n 



-^'\rnBnY^-tT{Bn)]. 



Since Bn is real and symmetric, there exists an orthonormal matrix U = Un 
with U'U = UU' = In and U'BnU = diag{Xi,n, A„,„). Let Z„ := U'Yn ~ 
AA(0,/„). We have 

n 

En{4>) = n-^'^[Z!nU'BnUZn " tr(5„)] = ^ A,,„(Z2 - 1). 

For L and i?^ as defined in Proposition A.l, we obtain, with Lemma 5.2 and 
Lemma 5.3, 



L = max{Ai,n,---,A„,„} 



spec 
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< 



1 

2^' 



spec 



and 



1=1 



1 



— 7^ " " Ws-pec 



2-K 
Un 



Proposition A.l now implies (36) and (37). Assertion (38) follows from [14], 
Lemma4.3(i). The relation P2,n{4>y < P2{<P)'^ + ^Poo{'P)v{(t)) [see Remark 4. 2(i)] 
has been proven in (65). Furthermore, 



^{\Enm >V)< Pi\Enm > v/2) + P(V^|EF„(<^) - F(</.)| > rj/2) 



(66) 



< ci exp -C2 



+ Ci exp 



P2{ 



I 



if 



which implies the assertion of Remark 4.2(ii). □ 

Proof of Theorem 4.3. We only prove the result for d = 1. The neces- 
sary modifications for arbitrary d > are obvious. Let Bn = {maxj=i^...^„ \Xt,n\ < 
C\/Togn}, where c is the constant from Lemma A. 4. This lemma says that 
lim„_^oo ^(^n) = 1- Let '^^(gj^^j^) be the smallest approximating set at 
level — according to the definition of the covering numbers so that 

SLnlogn ^ ° 



denote 



^ SLnlogn V --V SLnlogn '^'^a). For (/> g ^, let ^ ^^nv SLnlogn 

the best approximation in iSni gi^Jl^^^ ) ^- With this notation, we have 



(67) 



P sup I £;„((/)) I > r];Bn 
< P 



max 



+ 



r) 



\Er 



SLn log n ^ 

sup 



>rj/2 
\EJ 



i;)\>r]/2;Bn)=I + II. 



0,V.6*;p2(<^,^)<8X;j^ 

Using assumptions (41)-(43), we have 



I < ci exp < Akn log(8Ln^ log n/r]) — C2 — 

I T. 



?7V4 



(68) 



< ciexp< ^A:^ log(8Ln^ logn/r]) 



2 _|_ VPoa I PoqV 



C2 



r/V4 



< ci exp 



C2 T 
24(i r| 
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To complete the proof, we now show that for n sufficiently large, we have 
II = with Bn = {maxt=i^...^.„ \Xt^n\ ^ C\/logn}, where c is the constant from 
Lemma A. 4 [i.e., P{Bn) 1]. In order to see this, we replace 4> by 

(69) cl)*„{u, X) = n r cl){v,X)dv [with (/.(w. A) = for u < 0]. 

Ju— — 

n 

Then, on Bn, we have, by using Lemma A. 3, the facts that Poo{4> — V') ^ '^Poo 
and v{(j) — ij)) <2v , as, well as the definitions of B^ and L, that 

\En{ct^-n<V^\Fn{^-^)-Fn{4>*n-rn)\ 

(70) +^\EFn{(t>*n-rn)-^Fn{^-n 

log n 2L r] 7] 7] 7] 

< 2LvT.—=r- + Lp2{(l) - V)n\ogn + —V <- + - + - = -. 
Jn Jn 8 4 8 2 



For the last inequality to hold, we need rj > 2^Lus^^^, which follows from (42). 
Hence, we have II = 0. □ 

Proof of Theorem 4.4. We use the quantities Bn,4>n introduced in 
the proof of Theorem 4.3. Also, recall the definition of L given in (39). Let 

(71) E*M) = V^{Fn{<Pl) - EFni<Pl)). 

On Bn, we have, by using Lemma A. 3, that 

\{K - Enm\ < V^\Fn{(t>*n) " F„(0)| + (EF„ ((/>;) - EF„((/.)| 

< L—j=^ <^ + <- + - = -, 

/n Jn 4 4 2 



where the last inequality follows from assumption (46). Hence, 



P[snv\En{(p)\> il,Bn] < P[snv\E*n{(l,)\> r]/2,Bn] . 

We now prove the asserted maximal inequality for E'^. The general idea is 
to utilize the chaining device, as in [1]. 

First, we consider the case a > giJl^gn • ^^^^ case, choose 6o = a and 
let C2 > be the constant from (37). Then there exist numbers < 6j,j = 
1, . . . , K < oo, with a = 6o > 6i > ■ ■ ■ > 6k = sLniogn ^ such that with rjj^i = 
^6j+iH^{6j+i),j = 1,...,K, we have 

n 24 ~ 
(72) {>- H^{s)ds>J2r^,^,. 

20 Ln log n J=U 

The first inequality follows from assumption (48) and the second follows by 
using the property 6j^i < 5j/2 (see below for the construction of the Sj). 
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For each of the numbers 6j, choose a finite subset Aj corresponding to the 
definition of covering numbers N{5j,^, p2)- In other words, the set Aj con- 
sists of smallest possible number Nj = N{Sj,^, P2) of midpoints of /32-balls 
of radius 6j such that the corresponding balls cover Now, telescope 

K-l 

(73) e:{4>) = KiM + E ^n('^.+i - ^j) + Ki4> - 4>k), 

j=0 

where the <j)j are the approximating functions to <j) from Aj, that is, P2i4', (pj) < 
6j. Now take absolute value signs on both sides of (73), apply the triangle 
inequality on the right-hand side and then take the suprema on both sides. 
This leads to 



P(sup|^:(<^)|>r?/2,i?„) 

K-l 

<p(sup\E:{<Pi)\>v/4) + E iVjiVj+isupP(|£;:(0,+i-<^,)| >r?,+i) 

^■*6* ^ j^Q <j)&<S> 



+ P[sup\E*{^-^K)\>v/8,Bn 
= 1 + 11 + III. 

Note that the first two terms only depend on the approximating functions, 
and for every fixed j, those are finite in number. In contrast to that, the 
third term generally depends on infinitely many and, hence, this term is 
crucial. The way to treat it actually differs from case to case. 

Hence, using the exponential inequality (80), we have, by definition of a, 
that 

(74) /<ciexp|if(a) -C2 — I = ciexp|--^ — 

In order to estimate //, we need the exact definition of the 5j. Using the 
approach of Alexander [1], an appropriate choice [satisfying (72)] is 

^^^^ = SLnlogn ^ ''^P^'' " ^^''^'^ ^^""^ " ^^^^^'^^ 



and K = min{j : 5j = ^iniogn) - With these choices, we obtain 

—bj^xH{pj^\ 

K-l K-l 



II<Y1 ciexp(2#(5j+i) - 02 "' '^'' ] < E ciew{-H{6j+i)} 

j=o ^ -^J+i J i=o 

< E'ciexp{-2^+i#(a)}= E'ciexp|-2^|;^|<2ci exp|-|;^|. 



j=0 j=0 
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where the last inequahty holds for rj > ^log2. 

The proof of the fact that /// = is similar to the proof of II = in 
Theorem 4.3. Here, we again have to use assumption (46). We omit the 
details. 

It remains to consider the case a < gi^Jl^^^ ■ Here, we choose 5o = gj^^'j^g^ 
and K = 0. Hence, II = and we only have to deal with / and ///. Since 
H{6o) <H{a), we immediately get [cf. (74)] that / < ci exp{-f :^}. The 
fact that /// = follows similarly to (70). □ 



APPENDIX: AUXILIARY RESULTS 

First we prove a Bernstein inequality for xf-variables which is the basis 
for the Bernstein inequality derived in Section 4. 



Proposition A.l. Let Zi,...,Zn be independent standard normally 
distributed random variables and Ai, . . . , An be positive numbers. Define 



(75) 



1 



i?^ = — ^Af and L = max{Ai, . . . , A^}. 



n 



1=1 



Then we have for all r] >0, 



(76) 

and 
(77) 



P 



n 



i=l 



, 1 r?2 

> r/ < 2 exp — 



8R2 + I^ 



n 



-V2;^A.(z,2-i) 



>r] \ <6exp( ). 



Proof. One possibility is a direct proof via moment generating func- 
tions. Instead, we apply a general Bernstein inequality for independent vari- 
ables. It can be shown that E|Z? - Ip < 4™-i(m - 1)!. Therefore, we have 
for m > 2, 

(78) -^A^ElZf- ip < !^(4L)'"-2(2i2)2. 

1=1 

For example. Lemma 8.6 in [22] now implies (76). Since L < Rn^^'^, (76) 
implies (77) [consider the cases t] < R and r] > R separately and keep in 
mind that exp(— x^) < exp(— j; -|- 1)]. □ 



Recall now the definition of Riog{g) given in (19). 
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Lemma A. 2. Let J-n he such that Assumption 2.4(c) holds. Then we 
have 



sup \R\og{g)\ = O 



Proof. We have, with 

\RioM\ = 

1 



1/9, 



< 



:r / -y2^og(i)[-,x) - iog(t){u,x)du 

t-1 



dX 



1 



E 

t=i 



l/n 



log( 



n 



log( 



n 



+ x,X 



dxdX 



0{n 



TT fl/n 



t=l 



sup \4>{u,X) ^1 

u,u 



t-1 



+ X, A 



n 



dx dX 



□ 



In the proof of Theorem 4.4, we used En{(Ki) instead of En{^), where 



<P*niu,X) = n 



{v,X)dv [with A) =0 for u < 0]. 



The reason for doing so is that otherwise, we would have needed the ex- 
ponential inequahty (37) to hold with p2{(p) instead of P2,n (</>)• Such an 
inequality does not hold. Instead, we exploit the following property of : 



P2,n\^n 

(79) 



j2 r !"_j\u,x)dudx=P2{^) 

^ 1 J —TV J 

t — 1 n 



n 



(n, A) du I dX 



< 



Since the assertion and the proof of Theorem 4.1 are for n fixed, we obtain 
from (37) 



(80) 



P{\Er,{(t>*n)\>v)<Ciexp(-C2 



' P2{4'))' 

We note that Poo{4>*n) < Poo{4') and v{(f)*^) < v{(j)), which is straightforward. 
The following properties are used in the proofs above: 

Lemma A. 3. Let X^^n be a Gaussian locally stationary process. Then we 
have, with := maxj=i^...^„ , 



(81) 



|F„(0)-F„(C)|<:^Xf„)t;s(</>), 



n 
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(82) 

(83) 
(84) 

Proof. We have 

\Fn{<P)-Fn{<Pl)\ = 



EF„(0)-EF„(O| <:^^(0), 

n 



Fm < mV^Xf^^ + l)p2{cP) and 

2 



F^cPl) - EFM)\ < KsiV^X^^ + l)p2(</'). 



1 " 

-E 



n 



n -j^ oo 



n 



: -,A J„ -,A dA 



n 



t=l k=—oo 



n 



--k 
n 



{u,-k) 



du 



n 



Inequality (82) can be seen as follows: 



|EF„(0)-EF„(C)|<i- 



1 " 

^ t=i k 



n 



-,-k 
n 



{u,-k) 



du 



X COv(X[t+i /2+fc/2] ,n , X[t+1 /2-k/2] ,n] 



Proposition 4.2 of Dahlhaus and Polonik [14] implies sup^ | cov{Xt^n, Xt+k,n)\ ^ 
, which means that the above expression is bounded by ^ ^ < 

^v{(j)) for some A' > independent of n. For the proof of (83) and (84), we 
estimate all terms separately by using the Cauchy-Schwarz inequality and 
the Parseval equality: 



l^n(C)l<-E 



i=l 



n 



* — , A J„ — , A 



n 



dX 



<KP2AK)(-J: E 



1/2 



[t+1/2— fc/2] ,r. 



t=l k:l<[t+l/2±k/2]<n 



<K3P2{cP)V^Xly 

Similarly, we obtain |F((/>)| < K3P2{(t>) and |Ei^„((/>;)| < K^p2{(t))- □ 

Lemma A. 4. Lei Xt^n be a Gaussian locally stationary process. Then 
there exists a c> such that 



P( max \Xtn\ > c\/logn ) 0. 

\t=l,...,n ' / 
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Proof. We have, for some f* G R, Vf^n ■= var(Xt^„) = J2'j^-oo "■t,nij)'^ < 
V* uniformly in t and n. Since Xt^n is Gaussian, this imphes for c > \/2, 



max 

i=l....,n 



> cvioen I < P ( max 

~ ' ~ * t=l,...,n 



> cvlogn 



< ^exp 

t=i 



log n 
~2 



0. 



□ 
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